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^ ■ Abstract 
O i 

QJ I Recently, Bilu and Linial fSl formalized an implicit assumption often made when choosing a cluster- 

' ing objective: that the optimum clustering to the objective should be preserved under small multiplica- 

tive perturbations to distances between points. They showed that for max-cut clustering it is possible to 
circumvent NP-hardness and obtain polynomial-time algorithms for instances resilient to large (factor 
0{y^)) perturbations, and subsequently Awasthi et al. |2| considered center-based objectives, giving 
algorithms for instances resilient to 0(1) factor perturbations. 
_ In this paper, we greatly advance this line of work. For the /c-median objective, we present an 

I ^ , algorithm that can optimally cluster instances resilient to (1 + a/2) -factor perturbations, solving an open 

^ • problem of Awasthi et al. 1 2 1 . We additionally give algorithms for a more relaxed assumption in which we 

, allow the optimal solution to change in a small e fraction of the points after perturbation. We give the first 

bounds known for this more realistic and more general setting. We also provide positive results for min- 
sum clustering which is a generally much harder objective than /c-median (and also non-center-based). 
^ ■ Our algorithms are based on new linkage criteria that may be of independent interest. 

I Additionally, we give sublinear-time algorithms, showing algorithms that can return an implicit clus- 

■ tering from only access to a small random sample. 
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1 Introduction 



Problems of clustering data from pairwise distance information are ubiquitous in science. A common ap- 
proach for solving such problems is to view the data points as nodes in a weighted graph (with the weights 
based on the given pairwise information), and then to design algorithms to optimize various objective func- 
tions such as k-median or min-sum. For example, in the fc-median clustering problem the goal is to partition 
the data into k clusters Cj, giving each a center q, in order to minimize the sum of the distances of all data 
points to the centers of their cluster. In the min-sum clustering approach the goal is to find k clusters Cj 
that minimize the sum of all intra-cluster pairwise distances. Yet unfortunately, for most natural clustering 
objectives, finding the optimal solution to the objective function is NP-hard. As a consequence, there has 
been substantial work on approximation algorithms |[T2l |9l |71 [JOl |T| with both upper and lower bounds on 
the approximability of these objective functions on worst case instances. 

Recently, Bilu and Linial [8] suggested an exciting, alternative approach aimed at understanding the 
complexity of clustering instances which arise in practice. Motivated by the fact that distances between data 
points in clustering instances are often based on a heuristic measure, they argue that interesting instances 
should be resilient to small perturbations in these distances. In particular, if small perturbations can cause 
the optimum clustering for a given objective to change drastically, then that probably is not a meaningful 
objective to be optimizing. Bilu and Linial HI specifically define an instance to be a-perturbation resilienQ 
for an objective $ if perturbing pairwise distances by multiplicative factors in the range [l,a] does not 
change the optimum clustering under <I>11 They consider in detail the case of max-cut clustering and give 
an efficient algorithm to recover the optimum when the instance is resilient to perturbations on the order of 
a = 0{^/n). 

Two important questions raised by the work of Bilu and Linial fSl are: (1) the degree of resilience 
needed for their algorithm to succeed is quite high: can one develop algorithms for important clustering 
objectives that require much less resilience? (2) the resilience definition requires the optimum solution to 
remain exactly the same after perturbation: can one succeed under weaker conditions? In the context of 
center-based clustering objectives such as fc-median and fc-center, Awasthi et al. [1] partially address the 
first of these questions and show that an algorithm based on the single-linkage heuristic can be used find the 
optimal clustering for a-perturbation-resilient instances for a = 3. They also conjecture it to be NP-hard to 
beat 3 and prove beating 3 is NP-hard for a related notion. 

In this work, we address both questions raised by [8] and additionally improve over [2J. First, for the 
k-median problem we design a polynomial time algorithm for finding the optimum solution for instances 
resilient to perturbations of value a = 1 + \/2, thus beating the previously best known factor of 3 of Awasthi 
et al [2|. Second, we consider a weaker, relaxed, and more realistic notion of perturbation-resilience where 
we allow the optimal clustering of the perturbed instance to differ from the optimal of the original in a 
small e fraction of the points. This is arguably a more natural though also more difficult condition to deal 
with. We give positive results for this case as well, showing for somewhat larger values of a that we can 
still achieve a near-optimal clustering on the given instance (see Section 1.1 below for precise results). We 
additionally give positive results for min-sum clustering which is a generally much harder objective than 
/c-median (and also non-center-based). For example, the best known guarantee for min-sum clustering on 
worst-case instances is an 0{5^^ log^^^ n) -approximation algorithm that runs in time rp^^/^) due to Bartal 
et al. [7 1 ; by contrast, the best guarantee known for fc-median is factor 3 + e. 

Our results are achieved by carefully deriving structural properties of perturbation-resilience. At a high 
level, all the algorithms we introduce work by first running appropriate linkage procedures to produce a 

'Bilu and Linial |8| refer to such instances as perturbation stable instances. 

^Of course, the score of the optimum solution will change; what the definition requires is that the partitioning induced by the 
optimum remains the same. 
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hierarchical clustering, and then running dynamic programming to retrieve the best k-clustering present in 
the tree. To ensure that (under perturbation resilient instances) the hierarchy output in the first step has a 
pruning of low cost, we derive new linkage procedures (closure linkage and approximate closure linkage) 
which are of independent interest. While the overall analysis is quite involved, the clustering algorithms we 
devise are simple and robust. This simplicity and robustness allow us to show how our algorithms can be 
made sublinear-time by returning an implicit clustering from only a small random sample of the input. 

From a learning theory perspective, the resilience parameter, a, can also be seen as an analog to a margin 
for clustering. In supervised learning, the margin of a data point is the distance, after scaling, between the 
data point and the decision boundary of its classifier, and many algorithms have stronger guarantees when 
the smallest margin over the entire data set is sufficiently large ifTTlfTSl . The a parameter, similarly controls 
the magnitude of the perturbation the data can withstand before being clustered differently, which is, in 
essence, the data's distance to the decision boundary for the given clustering objective. Hence, perturbation 
resilience is also a natural and interesting assumption to study from a learning theory perspective. 

Our Results: In this paper, we greatly advance the line of work of fS ] by solving a number of important 
problems of clustering perturbation-resilient instances under metric k-median and min-sum objectives. 

In Section [3] we improve on the bounds of ||2| for a-perturbation resilient instances for the /c-median 
objective, giving an algorithm that efficientljH finds the optimum clustering for a = 1 + \pl. fc-median is 
NP-hard to even approximate, yet we can recover the exact solution for perturbation resilient instances. Our 
algorithm is based on a new linkage procedure using a new notion of distance (closure distance) between 
sets that may be of independent interest. 

In Section|4]we consider the more challenging and more general notion of (a, e) -perturbation resilience, 
where we allow the optimal solution after perturbation to be e-close to the original. We provide an efficient 
algorithm which for a > 2 + \fl produces (1 + 0(e//9))-approximation to the optimum, where p is the 
fraction of the points in the smallest cluster. The key structural property we derive and exploit is that, 
except for en bad points, most points are a closer to their own center than to any other center. Using 
this fact, we then design an approximate version of the closure linkage criterion that allows us to carefully 
eliminate the noise introduced by the bad points and construct a tree that has a low-cost pruning that is a 
good approximation to the optimum. 

In Section [5] we provide the first efficient algorithm for optimally clustering a-min-sum perturbation 
resilient instances. Our algorithm is based on an appropriate modification of average linkage that exploits 
the structure of min-sum perturbation resilient instances. 

We also provide sublinear-time algorithms both for the k-median and min-sum objectives (Sections 14.31 
and[5]), showing algorithms that can return an implicit clustering from only access to a small random sample. 

Related Work: In the context of objective based clustering, several recent papers have showed how to 
exploit various notions of stability for overcoming the existing hardness results on worst case instances. 
These include the stability notion of Ostrovsky et al. 1151 El that assumes that the instance has the property 
that the cost of the optimal k-means solution is small compared to the cost of the optimal (k — l)-means 
solution and the approximation stability condition of Balcan et al. LSJ that assumes that every nearly optimal 
solution is close to the target clustering. 

Even closer to our work, several recent papers have showed how to exploit the structure of perturbation 
resilient instances in order to obtain better approximation guarantees (than those possible on worst case 
instances) for other difficult optimization problems. These include the game theoretic problem of finding 
Nash equilibria 1111131 and the classic travehng salesman problem lfT4ll . 

'For clarity, in tiiis paper efficient means polynomial in both n (the number of points) and k (the number of clusters). 
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2 Notation and Preliminaries 



In a clustering instance, we are given a set S of n points in a finite metric space, and we denote d : S x S ^ 
M>o as the distance function. $ denotes the objective function over a partition of 5 into k < n clusters 
which we want to optimize over the metric, ie. <I> assigns a score to every clustering. The optimal clustering 
w.r.t. $ is denoted as C = {Ci, C2, . . . , C^}, and its cost is denoted as OVT. The core concept we study in 
this paper is the perturbation resilience notion introduced by IH. Formally: 

Definition 1. A clustering instance {S, d) is a-perturbation resilient to a given objective <I> if for any 

function d' : S x S ^ M>o s.t. yp,q G S,d{p,q) < d'{p,q) < ad{p,q), there is a unique optimal 
clustering C for ^ under d' and this clustering is equal to the optimal clustering Cfor <I> under d. 

In this paper, we focus on the /c-median and min-sum objectives. For the k-median objective, we parti- 
tion S into k disjoint subsets V = {Pi, P2, ■ ■ ■ , Pk} and assign a set of centers p = {pi,P2, ■ ■ ■ ,Pk} ^ S 
for the subsets. The goal is to minimize ^{V, p) = Yli=i Z^peP^ d{p,pi). The centers in the optimal clus- 
tering are denoted as c = {ci, . . . , c^}- Clearly, in an optimal solution, each point is assigned to its nearest 
center. In such cases, the objective is denoted as $(c). For the min-sum objective, we partition S into k 
disjoint subsets V = {Pi,P2, . . . , Pk}, and the goal is to minimize ^{V) = J2i=i J2p qeP ^(^'' Note 
that we sometimes denote $ as $5 in the case where the distinction is necessary, such as in Section [431 

In Section|4]we consider a generalization of Definition [T] where i.e. we allow a small difference between 
the original optimum and the new optimum after perturbation. Formally: 

Definition 2. Let C be the optimal k-clustering and C be another k-clustering of a set ofn points. We say 
C is e-close to C i/mirio-g^j. ^^Li IC'i \ — where a is a matching between indices of clusters of 

C and those ofC. 

Definition 3. A clustering instance {S, d) is (a, e)-perturbation resilient to a given objective $ if for any 
function d' : S x S ^ M>o s.t. yp,q G S,d{p,q) < d'{p,q) < ad{p,q), the optimal clustering C for ^ 
under d' is e-close to the optimal clustering Cfor <I> under d. 

For C 5 we define dsurn{A,B) = YlpeAY.qeBd{p,Q) and dsum{p,B) = dsum{{p],B). For 
simplicity, we will sometimes assume that minj | Cj | is known. (Otherwise, we can simply search over the n 
possible different values.) 

3 Of -Perturbation Resilience for the /c-Median Objective 

In this section we show that, for a > 1 + \/2, if the clustering instance is a-perturbation resilient for the 
A;-median clustering problem, then we can in polynomial time find the optimal fc-median clustering. This 
improves on the a > 3 bound of [2J and stands in sharp contrast to the NP-Hardness results on worst-case 
instances. Our algorithm succeeds for an even weaker property, the a-center stability, introduced in |[2l . 

Definition 4. A clustering instance (S, d) is a-center stable for the k-median objective if for any optimal 
cluster Ci £ C with center Ci, Cj £ C{j 7^ i) with center cj, any point p G Ci satisfies ad{p, Ci) < d{p, Cj). 

Lemma 1. Any clustering instance that is a-perturbation resilient to the k-median objective also satisfies 
the a-center stability. 

The proof follows easily by constructing a specific perturbation that blows up all the pairwise distances 
within cluster Cj by a factor of a. By a-perturbation resilience, the optimal clustering remains the same 
after this perturbation. This then implies the desired result. The full proof appears in [2]. In the remainder 
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of this section, we prove our results for a-center stability, but because it is a weaker condition, our upper 
bounds also hold for a-perturbation resilience. 

We begin with some key properties of a-center stable instances. 

Lemma 2. For any points p £ Ci and q G Cjij 7^ i) in the optimal clustering of an a-center stable 
instance, when a > 1 + \/2, we have: (1) d{ci,q) > d{ci,p), (2) d{p, q) < d{p, q). 

Proof. (1) Lemma [U gives us that d{q,Ci) > ad{q,Cj). By the triangle inequality, we have d{ci,Cj) < 

d{q,Cj) + d{q,Ci) < (1 + ^)d{q,Ci). On the other hand, d{p,Cj) > ad{p,Ci) and therefore d{ci,Cj) > 
d{p, Cj) — d{p, Ci) > (a — l)d{p, Cj). Combining these inequalities, we get (1). 

(2) Also by triangle inequality, d{p, q) > {a — 1) max((i(p, Cj), d{q, cj)). (Its proof appears in [2].) □ 

Lemma |2] implies that for any optimal cluster Cj, the ball of radius maxpgc'. d{ci,p) around the center 
Ci contains only points from Ci, and moreover, points inside the ball are each closer to the center than to any 
point outside the ball. Inspired by this structural property, we define the notion of closure distance between 
two sets as the radius of the minimum ball that covers the sets and has some margin from points outside the 
ball. We show that any (strict) subset of an optimal cluster has smaller closure distance to another subset in 
the same cluster than to any subset of other clusters or to unions of other clusters. Using this, we will be 
able to define an appropriate linkage procedure that, when applied to the data, produces a tree on subsets 
that will all be laminar with respect to the clusters in the optimal solution. This will then allow us to extract 
the optimal solution using dynamic programming applied to the tree. 

We now define the notion of closure distance and then present our algorithm for a-perturbation resilient 
instances (Algorithm [T]). LetB(p, r) = {q : d{q,p) < r}. 

Definition 5. The closure distance ds{A, A') between two disjoint non-empty subsets A and A' of point set 
S is the minimum d > such that there is a point c £ AU A' satisfying the following requirements: 

(1) coverage: the ball M{c, d) covers A and A', i.e. AD A' CI B(c, d); 

(2) margin: points inside B(c, d) are closer to the center c than to points outside, 
i.e. Vp G B(c, d),q ^ B(c, d), we have d{c,p) < d{p, q). 

Note that ds{A, A') = ds{A! ,A) < maxp d(p, q) \/A, A', and it can be computed in polynomial time. 

Algorithm 1 k-median, a perturbation resilience 
Input: Data set S, distance function d{-, •) on S. 
Phase 1: Begin with n singleton clusters. 

• Repeat till only one cluster remains: merge clusters C, C which minimize ds{C, C). 

• Let T be the tree with single points as leaves and internal nodes corresponding to the merges performed. 

Phase 2: Apply dynamic programming on T to get the minimum A;-median cost pruning C. 
Output: Clustering C. 



Theorem 1. For (1 + ^/i)-center stable k-median instances, Algorithm\l}outputs the optimal k-median 
clustering in polynomial time. 

The proof follows immediately from the following key property of the Phase 1 of Algorithm [T] The 
details of dynamic programming are presented in Appendix lF.il and an efficient 0(n'^)-time implementation 
of the algorithm is presented in Appendix IF.2I 

Theorem 2. For (1 + \/i)-center stable k-median instances. Phase 1 of Algorithm\l\(the closure linkage 
phase) constructs a binary tree s. t. the optimal clustering is a pruning of this tree. 
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Proof. We prove correctness by induction. In particular, assume that our current clustering is laminar with 
respect to the optimal clustering - that is, for each cluster A in our current clustering and each C in the 
optimal clustering, we have either AQC,orCC.AorAr\C = 0. This is clearly ti^ue at the start. To 
prove that the merge steps keep the laminarity, we need to show the following: if yl is a strict subset of an 
optimal cluster Cj, A' is a subset of another optimal cluster or the union of one or more other clusters, then 
there exists B from d \ A, such that ds{A, B) < ds{A, A') = ds{A', A). 

We first prove that there is a cluster S C Cj \ A in the current cluster list such that ds{A, B) < d = 
maxpgQ d{ci,p). There are two cases. First, if Ci A, then define B to be the cluster in the current cluster 
list that contains a. By induction, B C Ci and thus B Ci \ A. Then we have ds{B, A) < d since there 
is Ci G B, and (1) for any p £ AU B, d{ci,p) < d, (2) for any p £ S satisfying d{ci,p) < d, and any q £ S 
satisfying d{ci,q) > d, by Lemma[2]we know p £ Ci and q Ci, and thus d{ci,p) < d{p, q). In the second 
case when Cj G A, we pick any B (1 Ci \ A and a similar argument gives ds{A, B) < d. 

As a second step, we need to show that d < d = ds{A,A'). There are two cases: the center for 
ds{A, A') is in A or in A'. In the first case, there is a point c £ A such that c and d satisfy the requirements 
of the closure distance. Pick a point q £ A', and define Cj to be the cluster in the optimal clustering that 
contains q. As d{c,q) < d, and by Lemma |2] (i(cj, g) < d{c,q), we must have d{cj,c) < d (otherwise it 
violates the second requirement of closure distance). Suppose p = arg maxp/g(7^ d{ci,p'). Then we have 
d = d{p,Ci) < d{p,Cj)/a < {d + d{ci,c) + d{c,Cj))/a where the first inequality comes from Lemma[T] and 
the second from the triangle inequality. Since d{ci,c) < d{c, Cj)/a, we can combine the above inequalities 
and compare d and d{c, cj), and when a > 1 + \/2 we have d < d{c, cj) < d. 



case 1: c in A case 2: c in A' 




Figure 1: Illustration for comparing d and ds{A, A') in Theorem|2] 



Now consider the second case, when there is a point c £ A' such that c and d satisfy the requirements 
in the definition of the closure distance. Select an arbitrary point q £ A. We have d > d{c, q) from the first 
requirement, and d{c, q) > d{ci, q) by Lemma |2] Then from the second requirement of closure distance 

d{ci, c) < d. And by LemmalU d = d{ci,p) < d{ci,c), we have d < d{ci, c) < d. □ 

Note: Our factor of a = 1 + \/2 beats the NP-hardness lower bound of a = 3 of lO for center-stable 
instances. The reason is that the lower bound of lH requires the addition of Steiner points that can act 
as centers but are not part of the data to be clustered (though the upper bound of 121 does not allow such 
Steiner points). One can also show a lower bound for center-stable instances without Steiner points. In 
particular one can show that for any e > 0, the problem of solving (2 — e) -center stable fc-median instances 
isNP-hard HH. 
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4 {a, e) -Perturbation Resilience for the /c-Median Objective 

In this section we consider a natural relaxation of the a-perturbation resilience, the (a, e) -perturbation re- 
silience property, that requires the optimum after perturbation of up to a multiplicative factor a to be e-close 
to the original (one should think of e as sub-constant). We show that if the instance is (a, e)-perturbation 
resilient, with a > 2 + \/7 and e = 0{e'p) where p is the fraction of the points in the smallest cluster, 
then we can in polynomial time output a clustering that provides a (1 + e')-approximation to the optimum. 
Thus this improves over the best worst-case approximation guarantees known when e' < 2 and also beats 
the lower bound of (1 + 1/e) on the best approximation achievable on worst case instances for the metric 
k-median objective |[Tn[T2l when e' < 1/e. 

The key idea is to understand and leverage the structure implied by (a, e)-perturbation resilience. We 
show that perturbation resilience implies that there exists only a small fraction of points that are bad in the 
sense that their distance to their own center is not a times smaller than their distance to any other centers in 
the optimal solution. We then use this bounded number of bad points in our clustering algorithm. 

4.1 Structure of {a, e) -Perturbation Resilience 

For understanding the (a, e) -perturbation resilience, we need to consider the difference between the optimal 
clustering C under d and the optimal clustering C under d', defined as mhicreSk ^i=i \ \ ^^(i) I- Without 
loss of generality, we assume in this subsection that C is indexed so that the argmin a is the identity, and 
the distance between C and C is Yli=i IC'i \ Q'l- We will denote by the center of C^'. 

In the following we call a point good if it is a times closer to its own center than to any other center in 
the optimal clustering; otherwise we call it bad. Let Bi be the set of bad points in Cj. That is, Bi = {p : 
p £ Ci,3j i, ad{ci,p) > d{cj,p)}. Let Gi = Ci \ Bi be the good points in cluster i. Let B = UiBi and 
G = UjGj. We show that under perturbation resilience we do not have too many bad points. Formally: 

Theorem 3. Suppose the clustering instance is (a, e)-perturbation resilient and minj \Gi\ > (2+ -^^)en + 
^J^^. Then 151 < en. 

Here we provide a proof sketch of the theorem, and the full proof can be found in Appendix El At the 
end of Appendix lAl we also point out that the bound in Theorem [3] is an optimal bound for the bad points 
in the sense that for any a > 1 and e < i, we can easily construct an (a, e) -perturbation resilient 2-median 
instance which has en bad points. 

Proof Sketch of /Theorem [3l The main idea is to construct a specific perturbation that forces certain 
selected bad points to move from their original optimal clusters. Then the (a, e) -perturbation resilience 
leads to a bound on the number of selected bad points, which can also be proved to be a bound on all the bad 
points. The selected bad points Bi in cluster i are defined by arbitrarily selecting min(en + 1, \Bi\) points 
from Bi. Let B = Uj-Bj. For p G Bi, let c{p) = arg min^^ j-^j Cj ) denote its second nearest center; 

for p € Gi \ Bi, c{p) = Ci. The perturbation we consider blows up all distances by a factor of a except for 
those distances between p and c{p). Formally, we define d' as d'{p, q) = d{p, q) if p = c{q) or q = c{p), 
and d'{p, q) = ad{p, q) otherwise. 

The key challenge in proving a bound on the selected bad points is to show that = a for all i, i.e., the 
optimal centers do not change after the perturbation. Then in the optimum under d' each point p is assigned 
to the center c{p), and therefore the selected bad points {B) will move from their original optimal clusters. 
By (a, e) -perturbation resilience property we get an upper bound on the number of selected bad points. 

Suppose G[ is obtained by adding point set Ai and removing point set Mj from C,, i.e. Ai = G[ \ 
Gi, Mi = Gi \ G-. Let A = UjAj, M = UjMj. At a high level, we prove that a = c- for all i as follows. 
We first show that for each cluster, its new center is close to its old center, roughly speaking since the new 
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and old cluster have a lot in common (Claim [Til. We then show if q 7^ c'^ for some i, then the weighted sum 
of the distances J2i<i<ki\^i\ + " + 1 + \Mi\)d{ci,c'j) should be large (Claim|2]l. However, this contradicts 
Claim [H so the centers do not move after the perturbation. 

In proving Claims [U and [2l we need to translate d' to d, and for doing so, a key fact we show is that for 
any i, c[ 7^ Cj for any j 7^ i. The intuition is that if c[ = cj, then = cj should be farther to the points in 
Cj n Cj than c^-, thus cj should save a lot of cost on other parts of Cj than c'j, and by the triangle inequality, 
Cj should be far from c^ ; on the other hand, they are both close to points in C'j n Cj, which is contradictory. 

Claim 1. For each i, dsum{ci , (C^ n C^) \ A) > ^ (I n \ A I - I A \ M, | - | - (a + l))d{c, , c'J, 
i.e., d{ci,c[) can be bounded approximately by the average distance between Ci and a large portion ofCi. 

Proof Sketch: The key idea is that under d', c[ is the optimal center for C[, so it has no more cost than Cj 
on C[. Since Bi \ Mi and Ai are small compared to (Cj n C[) \ Bi, c[ cannot save much cost on Bi \ Mi 
and Ai, thus it cannot have much more cost on (Cj H C^') \ Bi than Cj. Then c[ is close to (Cj H C-) \ Bi, 
and so is Cj, then c'j is close to Cj. Formally, we have d'g^^{c'i, C-) < d'g^^{ci, C'i). We divide C- into three 
parts {Ci n C'i) \ Bi, Bi \ Mi and Ai, and move terms on (Cj n C-) \ Bi to one side (the cost more than Cj 
on (Cj n Cj') \ Bi), the rest terms to another side (the cost saved on Bi \ Mi and Ai). After translating the 
terms from d' to d, we apply the triangle inequality and obtain the desired result. □ 

Claim 2. Let /j = 1 ifci 7^ 4 and /j = otherwise. Then (a — 1) J2i<i<k hdsum{ci, {Ci n C^') \ Bi) < 
aEl<^<ki\Ai\ + a + 1 + \Mi\)d{c^,c[). 

Proof Sketch: We have ^ - dsum{ci, Ci) < dsum{c'i, Ci) by using the fact that Cj are the optimal 
centers for Cj under d; and ^Ij <«m(Cj, C'^) < EiKnm(ci, - Bi) + Y.p<.BMh "^i'^S 
fact that c'j are the optimal centers for Cj under d' . We multiply the first inequality by a and add it to the 
second inequality. Then we divide Cj into three parts Mj, Bi \ Mi and (Cj n C,') \ ^j, Cj into Ai, Bi\ Mi 
and (Cj n Cj) \ Bi, and Cj \ Bi into Ai and (Cj n Cj) \ B^. After translating the terms from c?' to d, we notice 
that the clustering that under d' assigns points in Cj \ Bi to Cj and points p in i?j \ Mj to c{p) (corresponding 
to the right hand side of the second inequality) saves as much cost as (a — 1) ^j dsum{ci, {Ci H Cj) \ Bi) 
on (Cj n Cj) \ Bi compared to the optimum clustering under d'. So, the optimum clustering under d' must 
save this cost on other parts, i.e. Ai and Bi \ Mi. By triangle inequality, we obtain the desired result. □ 

Combining Claims [T] and [2l we get 

V ad{c^,(!i) [{\Ai\ + a + 1 + \Mi\) - ^^(|(C, n C'i) \ B^\ - \Bi \ M,| - \Ai\ - {a + l))h] > 0. 

l<i<k 

If /j = 0, we have d(cj, c'j) = 0; if /j = 1, since |Cj| > (2+ -^^)en + ^"q"^'*''* , the coefficient of d{ci, c'j) is 
negative. So the left hand side is no greater than 0. Therefore, all terms are equal to 0, i.e. for all 1 < i < A;, 
d{ci, c'j) = 0. Then points in Bi will move to other clusters after perturbation, which means that Bi C Mj, 
thus B C M. Then |^| < |M| < en. Specially, \Bi\ < en for any i. Then \Bi\ < en, otherwise \Bi\ would 
be en + 1. So Bi = Bi, and B = B and \B\ = \B\ < en. □ 

4.2 Approximating the Optimum Clustering 

Since (a, e) -perturbation resilient instances have at most en bad points, we can show that for a > 4 such 
instances satisfy the e-strict separation property (the property that after eliminating an e fraction of the points, 
the remaining points are closer to points in their own cluster than to other points in different clusters). 
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Therefore, we could use the clustering algorithms in S |6l to output a hierarchy such that the optimal 
clustering is e-close pruning of this tree. However, this pruning might not have a small cost and it is not 
clear how to retrieve a small cost clustering from the tree constructed by these generic algorithms. In 
this section, we design a new algorithm for obtaining a good approximation to the optimum for (a, e)- 
perturbation resilient instances. This algorithm uses a novel linkage procedure to first construct a tree that we 
further process to output a desired clustering. This linkage based procedure uses an approximate version of 
the closure condition discussed in Section [3] We begin with the definition of approximate closure condition. 

Definition 6. Let p, q £ S and assume C is a clustering of S. Let Up^q = {C\C G C, |C \ M{p, d{p, q))\ < 
en, C n B(p, d{p, q)) / 0}. The ball B(p, d{p, q)) satisfies the approximate closure condition with respect 
to C if\ ^ceUp^q C\ > minj |Cj| — en and the following conditions are satisfied: 

(1) approximate coverage: the ball covers most ofUp^q, i.e. \ Ucgjy Ci \ M{p, d{p, q))\ < en,0 

(2) approximate margin: after removing a few points outside the ball, points inside are closer to each other 
than to points outside, i.e. 3E ^ 5 \ M{p, d{p, q)), \E\ < en, such that for any pi,p2 G B(p, d{p, q)), 

qi G S\M{p,d{p,q)) \ E, we have d{pi,p2) < d{pi,qi). 

We are now ready to present our main algorithm for the (a, e) -perturbation resilient instances. Algo- 
rithm |2] Informally, it starts with singleton points in their own clusters. It then checks in increasing order 
of d{p, q) whether the approximate closure condition is satisfied for B(p, d{p, q)), and if so it merges all the 
clusters in the current clustering nearly contained within the ball B(p, d{p, q)). As we show below, the tree 
produced has a pruning that respects the optimal clustering. However, this pruning may contain more than 
A;-clusters, so in the second phase, we clean the tree so that we can ensure there is a pruning with A;-clusters 
that coincides with the optimal clustering on the good points. Finally we run dynamic programming to get 
the minimum cost pruning, which as we prove provides a good approximation to the optimal clustering. 

Algorithm 2 k-median, (a, e) perturbation resilience 

Input: Data set S, distance function d{-, •) on S, minj |Cj|, e > 
Piiase 1: Sort all the pairwise distances d{p, q) in ascending order. 

• Initialize C to be the clustering with each singleton point being a cluster. 

• For d{p, q) in ascending order, 

• If B(p, d{p, q)) satisfies approximate closure condition and \Up^q\ > 1 then merge Up^q. 

• Construct the tree T with points as leaves and internal nodes corresponding to the merges performed. 
Piiase 2: If a node has only singleton points as children, delete his children from T; get T'. 

• For any remaining singleton node p in T', assign p to the non-singleton leaf of smallest median distance. 

Piiase 3: Apply dynamic programming on the cleaned T to get the minimum /c-median cost pruning C. 
Output: Clustering C, (optional) tree T. 

Our main result in this section is the following. 

Tlieorem 4. Suppose the clustering instance is (q, e) -perturbation resilient to k-median. If a > 2 + ^/7 and 
e ^ p/5 where p = (mirij \Ci\ — 15)/n, then in polynomial time, Algorithm^outputs a tree T that contains 
a pruning that is e-close to the optimum clustering. Moreover, if e < pe'/5 where e' < 1, the clustering 
produced is a (1 + e') -approximation to the optimum. 

Theorem |4]follows immediately from the following lemmas. The details can be found in Appendix iBl 

Lemma 3. If a > 2+\/7, e < p/5, then the tree T contains nodes Ni{l < i < k) such that Ni\B = Ci\B. 

■*Note that in the definition of Up^q, each cluster in it has at most en points outside the ball B(p, q)). But the approximate 
coverage is stronger: Up,q, as a whole, can have at most en outside. 
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Proof Sketch: For each i, we let q* = argmax^.g^-.^^ d{ci, qi). The proof follows from two key facts: 
(1) If C \ is laminar to C\B right before checking some d{p, q), then for any i, j, i ^ j such that either 
d{p,q) is checked before d{ci ,q*) or d{p,q) is checked before d{cj , q*) or p = Ci, q = q* or p = cj , q = q* , 
we have that Up^q cannot contain both good points from Q and Cj. (2) If C' \ B is laminar to C \ B right 
before checking d{ci,q*), we have that right after checking d{ci, q*) there is a cluster containing all the good 
points in cluster i and no other good points. 

Consider any merge step s.t. C/p,g contains good points from both Cj and Cj (j / i). Fact (1) implies both 
d{ci,q*) and d{cj,q*j) must have been checked, so fact (2) implies all good points in d and Cj respectively 
have already been merged. So the laminarity is always satisfied. Then the lemma follows from fact (2). 

We now prove fact (1). Suppose for contradiction that there exist good points from Cj and Cj in Up^q. 
From the laminarity assumption, the fact that clusters in Up^q have only en points outside B(p, d{p, q)) and 

< en, we can show there exist good points pi G Cj and pj G Cj in M{p, d{p, q)). When a > 2 + we 
can show d{ci,q*) < d{pi,pj)/2, and by triangle inequality d{pi,pj)/2 < d{p,q), so d{p,q) > d{ci,q*). 
The same argument leads to d{p,q) > d{cj,q*). This is a contradiction to the assumption that d{p,q) is 
checked before d{ci , q* ) or before d{cj , q*) or p = ci, q = q* or p = Cj , q = q* . 

We now prove fact (2). It is sufficient to show that Ucet/^ q*^ \ ^ ~ Ci \ B and f/c,;,q* satisfies 
the approximate closure condition. First, contains no good points outside Ci by fact (1). Second, 

any C containing good points from Cj is in Uc^.q*- By fact (1), C has no good points outside Cj. Since 
B(ci, d(cj, q*)) contains all good points in Cj, C has only bad points outside the ball, so C G Uc^^q*- We 
finally show f/c,;,q* satisfies the approximate closure condition. Since in addition to all good points in Cj, 
Uce(7 ,C can only contain bad points, it has at most en points outside B(cj, d(cj, g*)), so approximate 

coverage condition is satisfied. And we can show for a > 2 + ^/l, 2d(cj , q* ) is smaller than the distance 
between any point in B(cj, d(cj, g*)) and any good point outside Cj. Then \&t E = B \ M{ci,d{ci,q*)), 
approximate margin condition is satisfied. We also have | Uceu * C\ > \Ci\ B\ > mirij |Cj| — en. □ 

Lemma 4. If a > 2 + \fl, e < e'p/5 where e' < 1, then C is a {1 + e') -approximation to the optimum. 

Proof Sketch: By Lemma [3l T has a pruning V that contains Ni{l < i < k) and possibly some bad 
points, such that Ni \ B = Ci \ B. Therefore, each non-singleton leaf in T' has only good points from one 
optimal cluster and has more good points than bad points. This then implies that each singleton good points 
in T' is assigned to a leaf that has good points from its own optimal cluster. 

So V becomes V' = { A^^^'} such that N[\B = Ci\ B. It is sufficient to prove the cost of V' approximates 
OVT, i.e. to bound the increase of cost caused by a bad point pj G Cj ending up in N[{i / j). There 
are two cases: pj belongs to a non-singleton leaf node in T' or pj is a singleton in T' . In either case, the 
leaf in which ends up in T contains at least K = (mirij |Cj| — en)/2 — en good points pn from Cj such 
that for any other leaf containing only good points from Cj we can find at least K good points pjs from Cj 
satisfying d{pj,pit) < d{pj,pjs)- Then the increase of cost due to pj can be bounded as follows. 

d{pj,Ci) - d{pj,Cj) < ^ [d{pj,pit) + d{pit,Ci)] - ^ [d{pj,Pjs) - d{pjs,Cj)]\ < ^OVT. 

As\B\ < en, the cost of is < (1 + ^)OVT, so the minimum cost pruning C is a (1 + mm [cl-Sen )" 
approximation to the optimum. By setting e' > . „ — , we get the desired result. □ 

We note that approximate margin condition in the Definition |6]can be verified in O(n^) time by enumer- 
ating pi,p2 G B(p, d{p, q)),qi B(p, d{p, q)), and checking if there are no more than en such qi that there 
exist pi,p2 violating the condition. So Algorithm [2] runs in polynomial time. 
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4.3 Sublinear Time Algorithm for the fc-Median Objective 

Consider a clustering instance {X,d) that is (a, e)-perturbation resilient to fc-median. Let A'' = \X\, p = 
mirij |Cj|/A^ denote the fraction of the points in the smallest cluster, D = maxp^q^x d{p,q) denote the 
diameter of X, ^ = ^x{c) /N denote the average cost of the points in the optimum clustering. We have: 

Theorem 5. Suppose {X, d) is (a, e) -perturbation resilient for a > 8, e < p/20. Let < A < 1. Then w.p. 
> 1 — 6, we can get an implicit clustering that is 2j^(l + j^j^)-approximation in time 0{{^^^ In ^)^). 

Proof Sketch: We sample a set S of size n = Q{j^^^ln^) and run Algorithm [2] on S to obtain 

the minimum cost pruning C and its centers c. We then output the implicit clustering of the whole space 
X that assigns each point in X to its nearest neighbor in c. Here we describe a proof sketch that shows 
^x{c) ~ ^x{c), and the full proof can be found in Appendix ICl 

Since when n is sufficiently large, w.h.p. ^x{c)/N <l>5(c)/n and ^xic)/N « <I>5(c)/n, so it is 
sufficient to show <I>5(c) is not much larger than <I>s(c). However, C may be different from C n 5, so we 
need a bridge for the two. 

Notice w.h.p. S has only 2en bad points, and each cluster Cj n 5 is large. Moreover, when a > 8, any 
good point is 3 times closer to those in the same cluster than to those in other clusters. Then even if c are 
not sampled, we can choose an arbitrary good point from each cluster Cj n S to be its center, so that we 
can still prove Algorithm |2] forms nodes Ni approximating the clusters, and T has a pruning V', which is 
different from C H S only on the bad points. Suppose in S, c' are the optimal centers for V'. Then we can 
use <^s{T^' 1 c') as a bridge for comparing <I>5'(c) and <I>5'(c). 

On one hand, $5(0) < <I>s(T", c'). This is because (1) since C is the minimum cost pruning, <^*s'(C, c) < 
c'); (2) since in <I>5(c) each point is assigned to its nearest center but in ^s{C, c) this may not be 
true, ^'s(c) < ^s{C, c). On the other hand, <I>s('P', c') is not much larger than <I>5(c). This is because (1) 
$5('P', c) is different from <I>5(c) only on the bad points, so by the approach similar to that in LemmalU we 
can show the increase of cost is limited; (2) by the triangle inequality we have ^sCP', c') < 2<I>5'('P', c). □ 
Note: If we have an oracle that given a set of points C'^ finds the best center in X for that set, then we can 
save a factor of 2 in the bound. 

5 Qf-Perturbation Resilience for the Min-Sum Objective 

In this section we provide an efficient algorithm for clustering a-perturbation resilient instances for the min- 
sum /c-clustering problem (Algorithmic]). We use the following notations: davg{A,B) = dsum{A, B) / {\A\\B\) 
and davgip, B) = davg{{p], B). 

Algorithm 3 min-sum, a perturbation resilience 

Input: Data set S, distance function d{-, ■) on S, minj |Cj|. 
Phase 1: Connect each point with its i minj |Cj| nearest neighbors. 

• Initialize the clustering C with each connected component being a cluster. 

• Repeat till only one cluster remains in C: merge clusters C, C in C which minimize davg{C, C ). 

• Let T be the tree with components as leaves and internal nodes corresponding to the merges performed. 

Phase 2: Apply dynamic programming on T to get the minimum min-sum cost pruning C. 
Output: Output C. 



Theorem 6. For (3 ^ , )-perturbation resilient instances, Algorithm\3\ outputs the optimal min-sum 

k-clustering in polynomial time. 
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Proof Sketch: First we show that the a-perturbation resilience property implies that for any two different 
optimal clusters Cj and Cj and any A C d, we have adsum{-^, Ci \ A) < dsum{A, Cj). This follows by 
considering the perturbation where d'{p, q) = ad{p, q) if p ^ A, q ^ Ci\A and d'{p, q) = d{p, q) otherwise, 
and using the fact that the optimum does not change after the perturbation. This can be used to show that 
when Q > we have: (1) for any optimal clusters Cj and Cj and any A C d, A' C Cj s.t. 

min(|Ci \ A\, \Cj \ ^'|) > miiii \Ci\/2 we have davgiA, A') > mm{davgiA, d \ A),davg{A', Cj \ A!)]; 
(2) for any point p in the optimal cluster Cj, twice its average distance to points in Ci\{p} is smaller than the 
distance to any point in other optimal cluster Cj. Fact (2) implies that for any point p G Cj its |Cj|/2 nearest 
neighbors are in the same optimal cluster, so the leaves of the tree T are laminar to the optimum clustering. 
Fact (1) can be used to show that the merge steps preserve the laminarity with the optimal clustering, so 
the minimum cost pruning of T will be the optimal clustering, as desired. The full proof can be found in 
Appendix |D] □ 

Theorem 7. Suppose the clustering instance {X, d) is a-perturbation resilient to the min-sum objective 
where a > S^^j^^q^j^p^. Then w.p. > 1—6, we can get an implicit optimum clustering in time 0{{-^^ In ^)^) 
where rj = mmp^x,i<i<k davg{p, Ci) the minimum average distance between points and optimal clusters. 

Proof Sketch: We sample a set S of size n = \n.^) and run Algorithm [3] on S. We then output 

the implicit clustering of the whole space X that assigns each point p G X to Cj G C s. t. dgum {p, Ci) is min- 
imized. We have that for any p £ Ci and Cj (j / ^)>3 l*Jf,p^l_|-i^ dsum(P) Cj n S) < dsumi^Pi Cj Pi 5*), sincc 
when n is sufficiently large, dsum{p,CinS) ^ dsum{p,Ci)\S\/\X\, dsum{p,CjnS) ^ dsum{p,Cj)\S\/\X\ 
and "^^^'I'^g^l ~ ""^^vl^'l . So the tree T is laminar to C n 5. Since clusters in C n 5 are far apart, the 
cost increased by joining different clusters in it is larger than that saved by splitting clusters, so C n 5 is the 
minimum cost pruning, so Algorithm [3] on S outputs C H S, and the theorem follows. The full proof can be 
found in Appendix El □ 



6 Discussion and Open Questions 

In this work, we advance the line of research on perturbation resilience in clustering in multiple ways. 
For a-perturbation resilient instances, we improve on the known guarantees for A;-median and give the 
first analysis for min-sum. Furthermore, we analyze and give the first algorithmic guarantees known for a 
relaxed but more challenging condition of (a, e)-perturbation resilience, where an e fraction of points are 
allowed to move after perturbation. We also give sublinear-time algorithms for A;-median and min-sum under 
perturbation resilience. 

A natural direction for future investigation is to explore whether one can take advantage of smaller 
perturbation factors for perturbation resilient instances in Euclidian spaced More broadly, it would be 
interesting to explore other ways in which perturbation resilient instances behave better than worst case 
instances (e.g., natural algorithms converge faster). 
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A Full Proof of Theorem |3] 



The main idea for proving the theorem is to construct a specific perturbation that forces certain selected bad 
points to move from their original optimal clusters. Then the {a, e) -perturbation resilience leads to a bound 
on the number of the selected bad points, which can also be proved to be a bound on the number of all the 
bad points. 

Specifically, the selected bad points Bi in cluster i are defined by arbitrarily selecting min(en + 1, 
points from Bi. Let B = VJiBi. For p € Bi, let c{p) = argmin^^ Cj) denote its second nearest 

center; for p G Ci\ Bi, c{p) = Ci. The perturbation we consider blows up all distances by a factor of a 
except for those distances between p and c{p). Formally, 



Suppose after the perturbation, C'^ is obtained by adding point set Ai and removing point set Mj from Q, i.e. 

Ai = Cl\Ci,Mi = Ci\C-. Let A = UjAj, M = UjMj. For convenience, weuse VFj = {CinC-)\Bi,Vi = 
Bi \ Mi, and thus we have Q = VFj U U Mi, Cl = WiLlViLI Ai. 



Figure 2: Illustration for notations. Note that Wi = {Ci n C^) \ Vi = Bi\ Mi and C • = VFj U Vi U Ai. 

Given the perturbation, we prove that all the selected bad points move by showing that q = c'j for all i. 
We first show that for each cluster, its new center is close to its old center, roughly speaking since the new 
and old cluster have a lot in common (Claim[T|l. We then show if q / c'^ for some i, then the weighted sum 
of the distances X^i<j<fc(|^j| + a + 1 + \Mi\)d{ci,dj) should be large (Claim|2ll. However, this contradicts 
Claim [U so the centers do not move after the perturbation. 

In the proofs of Claim [U and |2l we make use of the triangle inequality frequently. However, d! is 
not a metric, so we need to translate d! to d. We begin with some useful facts and use them to prove a 
summarization of the translation from d' to d. 

Fact 1. Suppose the clustering instance is [a, e)-perturbation resilient. 



Proof. (1) Let q = c(c'j). There are three cases: 1. c'j is in Ci, 2. c[ is outside Ci and is a selected bad point; 
3. is outside Ci and is not a selected bad point. 



d!{p,q) = 



d{p,q) if p = c{q),OT q = c{p) 
ad{p,q) otherwise. 




(1) Ifc{c'i) e Ai, then d{c„c{c'i)) < (1 + a)d{ci,c'i). 

(2) //mini \a\>{^ + 2)en + 1, then c[ / Cj(Vj / i). 
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Case 1: is in Cj. Since c'j G we know either c'j G Wi or G Fj. If G Wj, then c{p) = ci ^ Ai 
which is contradictory to the assumption that c{dj) G ^j, so it must be that c'^ G Vi. We have 

d{ci, ci) < d{ci, C-) + d{c'i,ci) < (1 + a)d{ci,c'i) 

where the last inequality comes from the fact that c'j is a selected bad point and c(c'j) = c;. 
Case 2: In this case, c'- is from Bj for some j{j / i,j / /) and move to C'^ after perturbation, i.e. 
G -Bj n Ai. Then we have 

d{ci,ci) < d{ci,c'i) + (i(c-,Q) < d{ci,c'i} + ad{c'i,Cj) < (1 + a)d{ci,c'i} 

where the second inequality comes from the fact that c'^ is a selected bad point and c(c^) = q, and the last 
inequality comes from c[ G Cj . 

Case 3: In this case, c'j is from Q \ Bi and move to C^' after perturbation, i.e. G {Ci \ Bi) fl ^4^, we have 

d{ci, ci) < d{ci, C-) + d{di,ci) < 2d{ci,c[) < (1 + a)(i(Q, 4) 

where the second inequality comes from c'^ G C; and the last inequality comes from a> 1. 
(2) Assume cf^ = Cj. Then we would have the following: c'j 7^ ci{\/l). First 7^ Cj, since otherwise, moving 
all the points in Cj to C'^ will not increase the cost, which violates (a, e) -perturbation resilience. We also 
know that c^- ^ ci{l j) since otherwise, there is p G Wj, d{ci,p) = d{d-,p) < d'{c'j,p) < d'{d^,p) = 
d{cj,p), which contradicts the fact that p G Cj. 

Now we need to show that = cj and c'j / q(V/) lead to an contradiction. First we can lower bound 
d{cj,c'j). Due to (a, e)-perturbation resilience, Wj UVj = Cj H Cj satisfies \Wj UVj\> \Cj\ — en. These 
many points are closer to c'j than to c'j = Cj under d'. However, back in d, Cj is the optimal center for 
Cj = Wj U V^- U Mj, so it should save a lot of cost on Mj compared to c'j, which suggests Cj and c'j would 
be far apart. Formally, by the fact that Cj is the optimal center for Cj, we have 

dsum{Cj,Cj) = dsumiCj,Wj U Vj U Mj) > dsura{Cj,Cj) = dsum{Cj,Wj U Vj U Mj). 

To simplify the inequality, notice for any p G Wj, ad{c'j,p) = d'{c'j,p) < d'{c[,p) = d{cj,p), resulting in 
dsum{c'j,Wj) < dsum{cj,Wj)/a. For any p G Vj, ad{c'j,p) = d'{c'j,p) < d'{c[,p) = ad(cj,p), resulting 
in dsum.{c'j, Vj) < dsum{cj, Vj). So the inequality becomes 

dsumipji Mj) dsumicj , Mj) ^ dsumiCj,Wj) ~dsuinicj ,Wj) , 

\Mi\d{c'j,Cj) > {l-^)dsuUCj,Wj). (1) 

Second, we can upper bound d{cj,c'j) since points in Wj are both close to Cj and c'j. Let p* = 
arg minpg^y^ d{p, Cj), then 

d{Cj, c'j ) < d{Cj ,p*) + dic'j,p*)< 2d{c, ,p*)< 2dsum {Cj ,Wj)/\W,\. (2) 

When \Cj\ > (J^ + 2)en + 1, we have (1 - > 2\Mj\. Then Inequalities [J and |2] lead to 

d{cj,c'j) = 0. This means Cj = c'j which is a contradiction to the assumptions. □ 
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Fact 2. Suppose the clustering instance is (a, e) -perturbation resilient and miiij \Ci\ > (^j:rr + 2)en + 1. 
Ifci^ c'^, then we have 

adsum{ci,Wi \ {c(c-)}), 
ad{c'i,Vi), 

adsum{ci,Ai \ {c(c-)}), 
ad{ci, Vi), 

adsum{ci,Ai \ {c(c-)}) + a(l + a)(i(c-, q). 

Proof. These can be easily verified by the definition of d'. In most cases, d'{-,-) = ad(-,-); the only 
exceptions are the distances between p and c{p). The detailed verification is presented below. 

Since c'j / q, and by Fact[T](2), we know c'j / Cj(Vj). So when translating d'g^^{c'j^, C){C is Wj, 14 or 
Ai), we only need to check if c(c'j) G C. For VFj, 

> T^. \ {c(4)}) = a4.™(4, H^. \ {c(4)}). 

For Vi, since there is no center in Vi, ^'^^^(c'j, Vi) = ad{c'^, Vi). For Ai, 

d'sumic^^i) > d'sumiCi,Ai \ {c(c-)}) = a4«m(c-,^i \ {c(c-)}). 

Now consider the sum of distances concerning Cj. For d'g^^{ci, Wi) and d'g^^{ci, Vi), they follow from 
the definition of d'. For Ai, if c(c9 G ^j, then 

d-sumi^i^ ^i) = ^sMm(Ci) \ {c(Cj)}) + d (cj, c(c'j)) 

< adsumici, Ai \ {c(c-)}) + ad(Q, c(c-)) 

< adsumici, Ai \ {c(c-)}) + a{l + a)d(c-, q). 

where the last inequality comes from Fact[T](l). If c(c^) ylj, then 

d'sum{ci,A) = d',^^{ci,Ai \ {c(c-)}) < adsum{ci,Ai \ {c(c-)}) 

where the inequality comes from the definition of d'. □ 

Claimlll Suppose the clustering instance is {a, e)-perturbation resilient, and min, |Cj| > (^^rr + 2)en + 1. 
Then for each i, 

dsum{Ci,Wi) > —-i\Wi\ - \Vi\ - \Ai\ - (q + l))d{ci,C^i) 
a + 1 

where Wi = {Ci n C'l) \ Bi,Vi = Bi\ Mi. This means d{ci, c[) can be bounded approximately by the 
average distance between Ci and a large portion of Ci. 

Proof. Here we will prove an upper bound for the distance between the new center and the old center. The 
high level intuition is that under d' , c'j is the optimal center for C[, so it has no more cost than Cj on C[. 
Since Wi is much larger than Vi and Ai, if c'j has much more cost on Wi than Cj, then c'j cannot compensate 
the cost on Vi and Ai. So in order not to have much more cost on Wi, by the triangle inequality c[ should be 
close to Ci. In the following we give the detailed proof. 



dgum^'^ii^i) — 

dsumi'^i^^i) ~ 

dsumi^ii^i) — 

d'sumiCi,Wi) = 

dsumi'^iT^i) ~ 

dsumi'^ij^i) — 
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If = Ci, d{c'^,Ci) = 0, which immediately implies the bound. Otherwise, we need to use the fact that 
c'j has smaller cost than Cj on C'^ under d'. Formally, we have d'g^^{c'^, C[) < d'g^^{ci, C[), i.e. 

Translating d' to d by Fact|2l we have 

adsumic'i, Wi \ {c(c-)}) - dsum{ci, Wi) < adsum{ci, Vi) + adsumici, Ai \ {c(c-)}) + a(l + a)d{ci,Ci) 

-adsumic'i, Vi) - adsum{ci,Ai \ {c(c-)}). 



By the triangle inequality, 

ad{ci,c'i)i\Wi\ - 1) - (q + l)dsumici, Wi) < ad{ci,c'i)[\Vi\ + {\Ai\ - 1) + a(l + a)] 



which implies the desired result. □ 

Claim|2l Suppose the clustering instance is (a, e)-perturbation resilient, and min, |Cj| > (2 + ■^^)en + 1. 
Let li = 1 if a ^ c[ and li = otherwise. Then 

(a-1) Iidsumici,Wi) < a + a + 1 + |Mi|)d(ci, c'J 

l<j<fc l<i<k 

where Wi = {d n C^) \ I3i. 

Proof. Here we will prove a lower bound for the weighted sum of the distances between the new centers 
and the old centers. The high level intuition is as follows: the clustering that under d' assigns points in 
C- \ Bi = Wi U Ai to Ci and points p in Vi to c{p), saves as much cost as d'g^^{c[, Wi) — d'g^j^{ci, Wi) 
(a — 1) Y^- dsum{ci, Wi) on Wi compared to the optimum clustering under d', if c'j / Cj. So the optimum 
clustering under d' must save this cost on other parts of points. If the distances between {c^} and {cj} are all 
small, then {c^} could not save much cost compared to {cj}. So the weighted sum of the distances between 
{c^} and {cj} should be large. Formally, we have the following inequality from the fact that {c^} are the 
optimal centers under d', thus have no more cost than the clustering that under d' assigns points in Wi U Ai 
to Ci and points p in 1^ to c{p): 

Yl '^'sum (c'i , CO < ^ [d'sum {Ci ,C[\Vi) +Yd'{c{v).P)]- 0) 

i i p^Vi 

In the following we will show the detailed proof. In the proof, to estimate d'g^^{c[, Wi) — d'g^^{ci, Wi) = 
ctdsum{c'^, Wi) — dsum{ci, Wi) whcn c[ 7^ Ci, we will need the following fact: {q} are the optimal centers 
under d, so dsum{ci, Wi) could not be much larger than dsum{.c!i, Wi). Formally, we need Inequality |4] which 
comes from the fact that {cj} are the optimal centers under d: 

^ ^ dsum iCi,Ci) < ^ ^ dsum (Cj , Cj ) . (4) 

i i 

Now, as a first step, we combine the two inequalities. For Inequality [3l we need to divide C- into Ai , Vi 
and Wi, and divide C- \ Vi into Ai and Wi. For Inequality HI we need to multiply it by a and divide Ci into 
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three parts Mj, Vi and Wi. Add them up and keep the terms on the same part of points in the same group: 









+ 


adsumic'i, Mi) 


' - adsum{ci,Mi) 


+ 


adsumic'i, Vi) 


- d'sumi^'i, ^i) + X] ~ adsumiCi, Vi) 








+ 


adsumic'i, Wi) 


- d'sum{c'i, Wi) + 4„„(Ci, Wi) - adsumiCi, W.i)] 


> 


0. 





Notice the last line is essentially the terms needed in estimating ^'^^^(cj, Wi) — d'g^^{ci, Wi). And the 
rest terms approximate the cost compensated on Ai,Mi and Vi, which can be bounded in terms of d{ci,c^) 
by the triangle inequality. Then we can compare d{ci, c[) to the estimated d'g^^{d^, Wi) — (i'^„,„(cj, Wi). 
Formally, as a second step, we need to first translate d' into d, and then apply the triangle inequality to 
the terms on Ai, Mi and Vi. Rewrite it as Yli Ti > where are all the terms related to i. In the easy 
case when c'j = Cj, most terms cancel out, and we get Tj = Ylp^Vi^iP^'^iP)) ~ '^dsum{ci,Vi). Then 
Ti < [a — a)dsum{ci, Vi) = since Vi are the selected bad points. If c'j ^ Ci, Fact [2] leads to 

Ti < adsum{ci, Ai\{c{c'i)}) + a{l + a)d{ci,Ci) - adsumic'i, Ai\{c{c'i)}) 

+ Cxdsum {c'i ,Mi) - adsum {Ci , Mi) 

+ ad{ci, Vi) - ad{c'i, V) + ^ d{p, c{p)) - adsum{ci, Vi) 

+ adsumic'i, Wi) - adsumic'i, Wi \ {c(c-)}) + (1 - a)dsumiCi, Wi). 

The first line is bounded by aia + \Ai\)dici,c'i). The second line is bounded by adici,c'i)\Mi\. The third 
line is bounded by 0, since Vi are the selected bad points and thus Ylp^v d-iP^cip)) < adsumici,Vi). 
For the fourth line, if c(c'j) Wi, then it is + (1 — a)dsumici,Wi); otherwise, c(c'j) = Cj, then it is 
ad(cj, + (1 ~ ct)dsumici, Wi). In conclusion, in any case we have the following bound for Tj: 

T^ < ai\Ai\ + \Mi\+a + l)dici,c'i) + il-a)Iidsumici,W^). 

Then J2i Ti > implies the desired result. □ 

Theorem|3l Suppose the clustering instance is (a, e)-perturbation resilient and miiij |Cj| > (2+ ■^^)e-n + 
Then \B\<en. 

Proof. Claim [T] shows an upper bound for the distance between the new center and the old center of each 
optimal cluster, and Claim |2] shows a lower bound for the weighted sum of the distances between the new 
centers and the old centers when ^ a for some i. However, the two bounds lead to a contradiction when 
mirij \ Ci\ is sufficiently large, so the centers should not move after the perturbation. This then implies that 
in the optimal clustering under d' each point p is assigned to the center c(p), and therefore the selected bad 
points {B) will move from their original optimal clusters. Using this and the (a, e)-perturbation resilience 
property we get an upper bound on the number of the selected bad points. This can also be proved to be a 
bound on the number of all the bad points due to the way we construct 13. 
Formally, combining Claims [T] and |2l we get 

V ad(Q,c'J [i\Ai\ +a + l + \Mi\) - n C'i) \ B,\ - \Bi \ M,| - \Ai\ - (a + 1))/^] > 0. 

l<i<k 
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If li = 0, we have d{ci,c'j) = 0; if = 1, since |Cj| > (2 + ■^^)en + "^"^ the coefficient of d{ci, c[) is 
negative. So the left hand side is no greater than 0. Therefore, all terms are equal to 0, i.e. for all 1 < i < A;, 
d{ci, c[) = 0. Then points in Bi will move to other clusters after perturbation, which means that Bi C Mi, 
thus B C M. Then \B\ < \M\ < en. Specially, \Bi\ < en for any i. Then |-Bi| < en, otherwise \Bi\ would 
be en + 1. So Bi = Bi, and B = B and \B\ = \B\ < en. □ 

Note: Notice that the bound in Theorem [3] is an optimal bound for the bad points in the sense that for 
any a > 1, e < ^, we can easily construct an (a, e)-perturbation resilient 2-median instance which has en 
bad points. The construction is as follows. Construct 3 groups of points: Gi, G2 and B. Each of Gi and 
G2 has (1 — e)n/2 points, and B has en points. The distances within the same group are 1, while those 
between the points in Gi and G2 are M, those between the points in B and Gi are M/(a + 1) + 1, and 
those between the points in B and G2 are aM/{a + 1) — 1. The instance satisfies the triangle inequality, 
which can be verified by a case-by-case study. When M is sufficiently large, the optimal clustering before 
perturbation has one center in Gi and the other in G2, then B are trivially bad points. So it is sufficient 
to show that the instance is (a, e) -perturbation resilient. Notice the optimal cost before perturbation is 
(1 — e)n + en(^^ + 1), so the optimal cost after perturbation is no more than a[(l — e)n + e'n{-£p^ + 1)]. 
Suppose after perturbation, both of the two centers come from Gi U -B or both come from G2, then the cost 

is larger than (^|t ~ 1) > (^ii^ ~ + ^'^(^^TT ~'~ when M is sufficiently large. So in the optimal 

clustering after perturbation, it must be that one center is from Gi U i? and the other is from G2, then G\ 
and G2 remain in different clusters. This means the instance is (a, e) -perturbation resilient. 



B Full proof of Theorem H 

To prove the theorem, we first show the properties of the Unkage and cleaning phases in Lemma [3] and 
Lemma |4] respectively. Before that we present some facts that are used in the analysis. For each i, let 

ql = argmaXg^(,c,\Bd{ci,qi). 

Fact 3. Suppose a > 2 + -v/7, e < p/5. Let Ci, Cj{j ^ i) be two different optimal clusters. 

(1) For any good point pj G Cj \ B, d{ci,pj) > 3d{ci,qi); 

(2) For any point Pi £ M{ci,d{ci,q*)), any good point pj G Cj \B, d{pi,pj) > 2d{ci,q*); 

(3) IfC \ B is laminar toC\B before checking d{p, q), and G G Up^q contains good points from Ci, then 
C n M{p, d{p, q)) contains good points from Ci; 

Proof. (1) First we have 

d{ci,q*i) < --—jd{ci,Cj). 

Since ad{cj,pj) < d{ci,pj), we have 

a + 1 

——d{ci,pj) > d{ci,pj) +d{cj,pj) > d{ci,Cj). 
When a>2 + V7, we have 

3 

d{ci,pj) > -^^^d{ci,Cj) > ——jd{ci,Cj) > 3d{ci,q*). 

(2) This follows from (1) since 

diPi,Pj) > d{ci,pj) - d{ci,pi) > M{ci,q*i) - d{ci,q*i) = '^d{ci,q*i). 
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(3) If |C| = 1, then C C ]B(p, d{p, q)), and we are done. Otherwise, C must be generated from some 
merge step, and thus \C\ > 2en. We have \B\ < en , so C contains more than en good points. If C contains 
good points only from Cj, then C contains more than en good points from Q; this is also true when C 
contains good points from another optimal cluster, since in this case we have Ci\ B C C \ B from the 
laminarity assumption. In either case, since there are at most en points outside the ball B(p, d(p, q)), there 
must be at least one good point in C H B(p, d{p, q)) from Cj. □ 

Lemma|3l If a > 2+\/7, e < p/S, then the tree T contains nodes Ni{l < i < k) such that Ni\B = Ci\B. 

Proof. The proof follows from two key facts: 

(1) If C \ i? is laminar to C \ B right before checking some d{p, q), then for any i, j, i ^ j such that 
either d{p, q) is checked before (i(c.j, q*) or d{p, q) is checked before d{cj, q*j) or p = Ci,q = q* or 
p = Cj,q = q*,we have that Up^q cannot contain both good points from Ci and Cj. 

(2) If C'\B is laminar to C\B right before checking d{ci, q*), we have that right after checking d{ci,q*) 
there is a cluster containing all the good points in cluster i and no other good points. 

We have C \ B is laminar to C \ B initially. Consider any merge step corresponding to some d{p, q) 
such that Up^q contains good points from both Cj and Cj{j ^ i). Fact (1) implies that both d{ci,q*) and 
d{cj,q*) must have been checked, then fact (2) implies that all the good points in Ci and Cj respectively 
have already been merged. So C \ i? is always laminar to C\ B. Then the lemma follows from fact (2). 

We now prove fact (1). Suppose for contradiction that there exist good points from Cj and Cj in Up^g. By 
Fact[3l(3), there exist good points pj G Ci and pj G Cjm'Mi{p,d{p,q)). By Fact[3l(2), 2d{ci,q*) < d{pi,pj), 
and by triangle inequality d{pi,pj) < d{p,pi) + d{p,pj) < 2d{p,q), so d{p,q) > d{ci,q*). The same 
argument leads to d{p,q) > d{cj,q*). This is a contradiction to the assumption that d{p,q) is checked 
before d{ci,q*) or before d{cj,q*j) or p = Ci, q = q* or p = Cj , q = q* . 

We now prove fact (2). It is sufficient to show that Ucet/^ ^^^C \ B = Ci\B and U^^q* satisfies the 
approximate closure condition. First, Uc^^q* contains no good points outside Cj by fact (1). Second, any C 
containing good points from Cj is in Uc^^q*- If |C| = 1, then this is trivial. Otherwise, C must be formed by 
a merge step, and then by fact (1), C has no good points outside Cj. Since B(cj, d{ci,q*)) contains all good 
points in Cj, C has only bad points outside the ball, we have C € U^^q*- 

We finally show that Ua^q* satisfies the approximate closure condition. The approximate coverage 
condition is satisfied since in addition to all good points in Cj, Uceu , C can only contain bad points, so it 

has at most en points outside B(cj, d{ci, q*)). To show the approximate margin condition, we use Fact[3l(2): 
for a > 2 + "v/?, 2d{ci,q*) is smaller than the distance between any point in B(cj, d{ci,q*)) and any good 
point outside Cj. Then by eliminating all bad points outside B(cj, d(cj, q*)), we get the approximate margin 
condition. We also have | Uceu . C\ > \Ci\ B\ > miiij \ Ci\ — en. □ 

Lemma m If a > 2 + \fl, e < e'p/5 where e' < 1, then C is a {1 + e') -approximation to the optimum. 

Proof. It is sufficient to first prove that there is a pruning V' in T containing only k nodes N[{1 < i < k) 
such that N-\B = Ci\B, and then prove its cost is approximately OVT. 

By Lemma[3l the tree T has a pruning V which contains Ni{l < i < k) and possibly some bad points, 
such that Ni \ B = Ci \ B. We now show that any singleton good point p in A^j in T' remains in A^j after 
cleaning, so V becomes the desired V'. For any non-singleton leaf Lj under A^j, Lj under Nj{j ^ i), since 
they have more good points than bad points, we can find good points pi € Lj and pj G Lj, such that 

median{d(p,q),q G Lj} > d{p,pj) > d(p,pi) > median{d{p,q),q G Lj}. 
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So p is assigned to some leaf node under Ni. 

Therefore, V becomes V' = {N^, I < i < k} such that N-\B = Ci \ B. It is sufficient to prove the 
cost of V' approximates OVT, i.e. to bound the increase of cost caused by a bad point pj E Cj ending 
up in N[{i 7^ j). There are two cases: pj belongs to a non-singleton leaf node in T' or pj is a singleton in 
T' . In the first case, suppose pj is merged into the leaf node when checking d{p, q). Then according to the 
approximate margin condition, there are at least mirij | Cj | — Sen good points pn from Cj inside B (p, g) ) , 
and the same number of good points pjs from Cj outside B(p, d{p, q)), such that d{pj,pit) < d{pj,pjs)- In 
the second case, since pj is assigned to the nearest non-singleton leaf node according to median distance, 
there are at least K = (minj |Cj| — en)/2 — en good points pu from Cj inside the leaf node, and the same 
number of good points pjg from Cj outside the leaf node, such that d{pj,pit) < d{pj,pjs). In either case, 
the increase of cost due to pj can be bounded as follows. 

d{pj,Ci) - d{pj,Cj) < X] [(^(Pj^Pit) + d{pit,Ci)] - ^ [diPj,Pjs) - d{pjs,Cj)]\ 

^l<t<K l<s<K ^ 



< 

- K 



As < en, the cost of V' is at most (1 + ^)OVT, so the minimum cost pruning C is a (1 + ■ 



2tn 



j^l^i , , 111^ iiiiiiiiiiuiii ^^^^ ^ mini I Ci I- Sen ^ 

mini |Ci|— Sen ' 



approximation to the optimum. By setting e' > . '\^\_n — , we get the desired result. □ 



Theorem|4j Suppose the clustering instance is (q, e) -perturbation resilient to k-median. If a > 2 + ^/7 and 
e ^ where p = (minj |Cj| — 15)/n, then in polynomial time, Algorithm^outputs a tree T that contains 
a pruning that is e-close to the optimum clustering. Moreover, if e < pe'/5 where e' < 1, the clustering 
produced is a {1 + e') -approximation to the optimum. 

Proof. The theorem follows immediately from Lemma[3]and|4l 

Running Time Sorting the distances takes O(n^). Computing M{p,d{p,q)) and Up^q takes 0{n^). The 
step of checking if M{p,d{p,q)) satisfies the approximate closure condition needs more careful analysis. 
The approximate coverage condition can be verified in 0(n). The approximate margin condition can be 
verified in O(n^) as follows: for each gi G 5 \ B(p, d{p, q)), enumerate all pi,p2 G ^{p-, d{p, q)) to check 
if there exist pi,p2 such that they violate the requirement d{pi,p2) < d{pi,qi), and if so, mark qi, then 
the approximate margin condition is satisfied if and only if the number of marked points is no more than 
en. So in total, the linkage phase takes O(n^). The cleaning phase takes time O(n^), and the dynamic 
programming takes time 0(n'^). Therefore, the running time to get the approximation is O(n^). □ 

C Full Proof of Theorem g] 

TheoremlD Suppose {X,d) is (a, e) -perturbation resilient for a > 8, e < p/20. LetO < A < 1. Then with 
probability at least 1 — 5, we can get an implicit clustering that is 2y^(1 + -^^^^)-approximation in time 

0((^lnf)5). 

Proof. We sample a set S of size n = In ^) and run Algorithm |2]on S to obtain the minimum cost 

pruning C and its centers c. We then output the implicit clustering of the whole space X that assigns each 
point in X to its nearest neighbor in c. We will show that the cost of this clustering is close to the optimum, 
i.e. $;,(c)<2i±A(i + ^)ci>^(c). 
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To compare the two, since c is computed on S, we first approximate them using costs on S. Formally, 
for every set of centers c, E($5(c)/n) = ^x{c)/N, so if n > ^'"2^^^^?'^ » the Chernoff bound leads to 



Pr 



n 



N 



> A 



N 



Pr 



<^>s{c)/n-^x{c)/N 



< 2e 



< 



D 
6 



> A 



^x{c)/N' 



D 



By the union bound, we have with probability at least 1 — (5/4, for any /c-median centers c, ^^(c)/?! 
^>x(c)/iV. Specifically, {1 - X)<^x{c)/N < <^s{c)/n md <^s{c)/n < {1 + X)<^x{c)/N. So it is sufficient 



to show $5(c) < 2(1 + 



8e 



p-12e 



)^s{c)- However, C may be different from C n 5, so we need a "bridge" for 



the two. 

Now we turn to analyze Algorithm [2] on 5 to find such a bridge. First we have that X has at most eA'^ 

5/4, 5 has at most 2en bad points. Similarly, 

5/4, for any i, {Q n S\ > 5 x (2en) + 15. 
Moreover, for any good points pi,p2 ^ Ci H S \ B,qi e Cj (1 S \ B{j i), we have 



bad points, so when n > with probability at least 1 

when n > ^OMSfc/^^ j^^^^ ^-^.j^ probability at least 1 



d{pi,P2) < d{ci,pi) + d{ci,p2) < 



d{pi,qi) + d{p2,qi) ^ 2d{pi,qi) + d{pi,p2) 



a 



1 



a 



1 



and thus 3d{pi,p2) < d{pi,qi) when a > 8. Then we can choose an arbitrary good point Cj from CiH S 
to be the center of Cj n S, so that even if q is not sampled, we still have that the tree T in Algorithm |2] 
has nodes Ni such that Ni \ B = dn S \ B. This is because UceC/- \ B = Cir\S\B where 
q* = arg min^g(^^pi5'^^ d(cj, g), and f/ci,g* satisfies approximate coverage and margin conditions since the 
distance between any good point outside Cj and any point in the ball M{ci, q*) is 2 times larger than d{ci, q*). 
So Algorithm |2] can successfully produce a tree T with a pruning V' = {N[, N'2, . . . , -/V^}, such that N[ \ 
B = Ci n S \ B. Suppose in S, c' = {c'^, c'2, ■ ■ ■ , c'^} are the optimal centers for V'. Then we can use 
^si'P' J c') as a bridge for comparing $s(c) and $s(c). 

On one hand, we have <I>5'(c) < ^si'P'i c'). First, since C is the minimum cost pruning, '&s'(C, c) < 
^si'P' , c'). Second, since in <I>s'(c) each point is assigned to its nearest center but in $5(C, c) this may not 
be true, <I>5(c) < <I>5(C, c). 

On the other hand, we have ^siV, c') < 2(1 + -^)^s{c). First, <^>si'P',c) is different from <I>5(c) 
only on the bad points. Using the approach similar to that in Lemma |4l we can show that for any bad point 
Pj ^ Cj n S which ends up in the wrong cluster N-{i ^ j), it must be closer to (mirij |Cj n S| — 6en)/2 > 
(miiij |Ci|/(2A^) — 6e)n/2 good points in N- than to the same number of good points in Cj, and thus we 



have <^s{'P',c) < (1- 



8e 



-12e 



)^s{c)- Second, <I>s('P', c) may be smaller than <I>5('P', c'), since c are chosen 



from X while c' are only chosen from S. But by the triangle inequality, for any £ V', 

2\Nl\ d{p,c,) = E E [d{p,c^)+d{q,c,)] > H ^(P^l) ^ E E ^(9'^'*) = l^^l E ^(9' ^- 



SO we have ^siV, c') < 2^s{V\ c). 



□ 



D Full Proof of Theorem [6] 

To prove the theorem, first we show that the a-perturbation resilience property implies that for any two 
different optimal clusters Ci and Cj and any A C Cj, we have adsum{A,Ci \ ^) < dsum{A,Cj). This 
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follows by constructing a specific perturbation and using the fact that the optimum does not change after 
the perturbation. This can be used to show that when a > S^^j^^q^i^pj we have: (1) for any two different 
optimal clusters Cj and Cj and any A C d, A' C Cj such that min(|Ci \ A\, \Cj \ A'\) > mirij |Cj|/2, 
davg{A,A') > m.m{davg{A, Ci\A), davg{A' , Cj\A')); (2) for any point p e Cj,its |Ci|/2 nearest neighbors 
are in the same optimal cluster. These two facts then imply that C in Algorithm [3] is always laminar to the 
optimal clustering, so the minimum cost pruning of T will be the optimal clustering, as desired. 
We now formally present a detailed proof and first start with a few useful facts. 

Fact 4. Suppose the clustering instance is a-perturbation resilient to the min-sum objective. For any two 
different optimal clusters Ci and Cj and any A Q Ci, A' CI Cj, we have the following claims. 

(1) adsum{A, Ci\A) < dsum{A, Cj). 

(2) 0:dsuin{A, Ci \ A) < T-T^dsMmC^) + rTTT'^sumC^') Cj \ A'). 



(3) When a > '^"'^ I^A^I '^"'^ l^'j \^'| are larger than miiij \Ci\/2, we have davgiA, A' 

mm{davg{A, d \ A), davg{A', Cj \ A')). 



(4) When a > '^^^~0i~r[' far any point p, all its mirij \Ci\/2 nearest neighbors are in the same optimal 
cluster 

Proof. (1) This follows by considering a specific perturbation and using the fact that the optimum does not 
change after the perturbation. We define a perturbation as follows: d'{p, q) = ad{p, q) if p ^ A, q ^ Ci \ A 
or q ^ A, p ^ Ci\ A, and d'{p,q) = d{p,q) otherwise, d' is a valid a-perturbation of d, so the optimal 
clustering after perturbation should remain the same. Specially, its cost should be smaller than that of the 
clustering obtained by replacing Ci, Cj with Ci\ A, Au Cj. After canceling the terms common in the two 
costs, we have 2d'g^^{A, Ci\A) < 2d'^^^{A, Cj), which implies adsum{A, Ci\A) < dsum{A, Cj). 

(2) The first claim shows that we can bound dsum{A, Ci\A) by dsum{A, Cj). We can divide Cj into two 
parts: a subset A' and the rest points Cj \ A'. Then from the triangle inequality we can bound dsum{A, Cj) 
hyd 

sum{-A-i snd dsum{A' , Cj \ A'). Formally, we have from the first claim: 

0:dsum{-Aj Ci \ A) < dsum{A, Cj) = dsum{-Aj ^ ) + dgumiA, Cj \ A) 

= dsumiA,A') + Y^ ^(^^'9) 
pi^AqeCj\A' 

< dsum{A,A') + j^S2 E T.id{p,p')+d{p',q)] 

, \C-\A'\ ,1^1 , , 

— dsum{-A, A) -\ dgumi-A, A) -\- i^^i dsum,{A , Cj \ ^ ) 

= l^'^l dsum{A, A) -\- i^^i dsum{A , Cj \ A). 

where the last inequality comes from the triangle inequality. 

(3) We have from the second claim: 

|(^.| , l^l 

Oidsum{A, Ci \ A) < dsum{A, A) -\- dsuin{A ,Cj \ A ), (5) 



IC'I \A'\ 

adsum{A',Cj\A') < ^--^dsurn{A',A)+^-—^dsum{A,Ci\A). (6) 
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Divide Inequality [5]by 1^41, divide Inequality [6]by \A'\, and add them up: 

/ -I dsumj^'j Cj \ A') -I ^ dsumjA, Cj \ A) |^ rfsum(^; A') 



Suppose davg{A, A') < minidavgiA, d \ A),davg{A', Cj \ A')), then 

(a - l)\C,\A\da.g{A,A') + (a - l)\Ci \ A\da,g{A,A^) < (|Q| + \Cj\)da,g{A,A'), 
(a-l)|C,-\^| + (a-l)|Q\^l < \Ci\ + \Cj\, 

which is contradictory when a > '^-^^^0j^^ and \ Cj \ A'\ and |Cj \ j4| are larger than mirij \ Ci\/2. 

(4) Suppose p comes from the optimal cluster Cj. Let p' = argming^c". d{p, q), and suppose p' comes 
from the optimal cluster Cj. There are two cases. The first case is that davg {p, Ci\{p}) > davg {p', Cj\{p'}). 
From this assumption and Fact|4l(2) we have 

adsum{P, Ci \ {p}) < \Cj\d{p,p') + \Cj\ {p'}\davg{p, Ci \ {p}) 

which leads to 

da.g{p,a\{p}) < 

The second case is that davg{p, Ci \ {p}) < davg{p', Cj \ {p'}), then a similar argument leads to 

/ / \Ci\ I 

davgiP, Ci \ {p}) < davgip, Cj \ {p}) < | _ _ -^^ d{p,p). 

In both cases, when a > ^^^^ff^, da^g{p,Ci \ {p}) < d{p,p')/2. Let E = {q e d : d{p,q) > 
d{p,p')}. Then we have 

\E\d{p,p) < dsum{p,E) < dsum{p,Ci \ {p}) < — d{p,p). 

\C- 1 

This means \E\ < i.e. more than \Ci\/2 points in Cj are within distance less than mmq^c^ d{p,q). 
Therefore, for any point p, all its miiij \ Ci\/2 nearest neighbors are in the same optimal cluster. □ 

Theorem |6l For (3 ^^^f\^^^}-^ )-perturbatwn resilient instances, Algorithm\3}finds the optimal min-sum k- 
clustering in polynomial time. 

Proof. It is sufficient to show that in Algorithm [3l C is always laminar to the optimal clustering C, i.e. for 
any A ^ C and C E C, we have either ACC,orC <^ A, or Ar\C = 0. Then the minimum cost pruning 
of T will be the optimal clustering, which can be obtained by dynamic programming. 

Intuitively, Fact|4l(4) implies that C is laminar initially, and Fact 111(3) can be used to show that the 
merge steps preserve the laminarity, so C is always laminar to the optimal clustering. 

Formally, we prove the laminarity by induction. By Fact|4](4), C is laminar initially. It is sufficient to 
prove that if the current clustering is laminar, then the merge step keeps the laminiarity. Assume that our 
current clustering C is laminar to the optimal clustering. Consider a merge of two clusters A and A'. There 
are two cases when laminarity could fail to be satisfied after the merge: (1) they are strict subsets from 
different optimal clusters, i.e. A C Ci, A' C Cj ^ Ci; (2) ^ is a strict subset of an optimal cluster Cj and 



23 



A' is the union of one or several other optimal cluster(s). By Fact|4](3), the first case cannot happen. In the 
second case, for any E that is a subset of Ci\A in the current clustering, we have davg{A, E) > davg{A, A'). 
We know that davgiA, Ci \ A) is a weighted average of the average distances between A and the clusters 
that are subsets of Cj \ ^ in the current clustering, so davg{A, Ci\A)> davg{A, A'). Also, davg{A, A') is 
a weighted average of the average distances between A and the optimal clusters in A', so there must exist 
an optimal cluster Cj C A' such that davg{A, Cj) < davgiA, A') < davg{A, Ci \ A). This means 

|C'| 

dsumiA, Cj) < 1^ ^ ^1 dsum{A, Ci \ A) <. adsum{A, Ci \ A) 

where the last inequality comes from a > 3^^ff^^ and {d \ A\ > minj|Cj|/2. This contradicts 
Fact|4l(l). So the merge of the two clusters A and A' will preserve the laminarity. 

Running Time Finding the nearest neighbors for each point takes 0{n log n) time, so the step of construct- 
ing components in Algorithm [3] takes 0{n? logn) time. To compute average distances between clusters, we 
can record the size of each cluster, and dsum{Cl, Cj) for any C'-, C'j in the current clustering, and update 
dsumiC'i U Cj,Ci) = dsum{C'i, C'l) + dsum{C'j , C'l) for any other cluster C[ when merging C- and Cj. So 
the merge steps take O(n^) time. As dynamic programming takes 0{n?) time, we can find the optimum 
clustering in 0(n'^) time. □ 



E Full Proof of Theorem H 

We sample a set S of size n = Q{-^^\x].^) and run Algorithm [3] on S. We then output the implicit 

clustering of the whole space X that assigns each point p G X to Cj G C such that dsum{p, Ci) is minimized. 
To prove the theorem, we first show that with high probability, C in Algorithm |3]is always laminar toCnS, 
and thus C n is a pruning of the tree. Then we show that C n 5 is actually the minimum cost pruning C. 
Using this fact, we prove that the implicit clustering obtained is the optimum clustering C. We now formally 
present a detailed proof. 

Fact 5. Suppose the clustering instance {X, d) is a-perturbation resilient to the min-sum objective where 
a > ^ j^^^Ac^\li - If the size of the sample n = In ^), then with probability at least 1 — 5, C in 

Algorithm\3\is always laminar to C H S. 

Proof. The intuition is that on X, for any I < i j < k, any p G d, we have adsum{p, Ci) < dsum{p, Cj). 
When n is sufficiently large, we can show dsum{p,Ci H S) ^ dsum{p,Ci)\S\/\X\, dsum{p,Cj r\ S) 

dsumip, Cj)\S\/\X\ and mTn^'^c.ris'hi ^ T^n^\c\\-i ' ^"'^ ^^^^ ^^^^ ^ similar claim on S. Then C in 
Algorithm [3] is always laminar to C n S". 

Formally, it is sufficient to show that with probability at least I — S, for any 1 < i j < k,p £ Ci, 
< IQ n 5| < ^|Ci|M and 3^^^0l^dsum{p, CiHS) < d^umip, Cj n S). Notice we have 
0(dsum{p, Ci) < dsum{p, Cj) by Fact|4l(l). So we need to show with high probability: 

(1) \Ci nS\< </2\Ci\^^ and d^umip, Q n 5) < ^\§-^dsum{p, Ci); 

(2) |C,- n 5| > and dsumip, Cj r\ S) > -j=]^^dsum{p, Cj); 

max,; |C,;n5| ^ maxi \Ci\ 

mirii \Cir\S\-l - "^^JESTlt^FT' 
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To prove the first claim, we sliow tliat if n is sufficiently large, then with high probability, ' ' < 
v^Jj^, and davgip, CiD S) < \^davgip, Ci). On one hand, from Hoeffding's inequality, we have 



Pr 



ic^nsi 



|5| 



\Ci\ 

\x\ 



> 



\Ci\ 

\x\ 



< 2exp{-2(V2 - 1) 



1^1 



rn} < 2exp{-2(V2 - 



On the other hand, under the condition 



\c.\ 



< (^- wehave 



Pr 



davgip, CidS) > \^davg{p, Ci) 



< exp{-2(v^-l) 



davg {jPi Ci 

D 



n 2 



\Cir^S\} 



< exp{-2{^-lf^{2-^)pn}. 



Therefore, if n = @{-^^ In ^), then the first claim is true with probability at least 1 — A similar 
argument holds for the second claim. By the union bound, with probability at least 1 — 5, we have the two 
claims for any p G Ci, any Cj / Cj. Under this condition, we have for any i, ' ''' 
then the third claim follows: 



^ |ang| ^ ^\c^\ 



< 



maxj \ Ci 



< 



\s\ 



V2\X\ 



Cmin,: ICJ - 1) 



< V2 



maxj \Ci\ 
mill,- IC I — 1 



□ 



Fact 6. Suppose the clustering instance {X, d) is a-perturbation resilient to the min-sum objective where 



a > 6 



maxj \Ci\ 



If the size of the sample n 



mini IQI-l , 

minimum min-sum cost pruning of the tree in Algorithm\3\is C r\ S. 



0(^^2-^ In — ). then with probability at least 1 



5, the 



Proof. Since the tree is laminar to C n 5, we know that C n 5" is a pruning of the tree, and any other pruning 
that is not C r\ S can be obtained by splitting some clusters in C n 5" and joining some others into unions. 
Intuitively, the clusters in C n are far apart, so the cost increased by joining different clusters is larger 
than the cost saved by splitting clusters. This claim then implies C n 5 is the minimum cost pruning of the 
tree. We first prove a similar claim for C by the a-perturbation resilience, i.e. for any three different clusters 
Cj, Cj, C; G C, any Ax ^ Cj, ctdgu^ {Ax,Ci \Ax) < dsum{Cj,Ci). Then we prove the claim for Cr\S: 
for any A C Cj n 5, dsum{A, Cir\S \A) < dsum{Cj n 5, Q n S) /2. Finally we use it to prove C n S" is 
the minimum cost pruning. 

First, for any Ax ^ Ci, we define a perturbation as follows: blow up the distances between the points 
in Ax and those in Cj \ Ax by a factor of a, and keep all the other pairwise distances unchanged. By 
the a-perturbation resilience, we know that C is still the optimum clustering after perturbation. Therefore, 
it has lower cost than the clustering obtained by replacing Cj with Ax and Cj \ Ax, and replacing Cj 
and Ci with Cj U C/. After canceling the common terms in the costs of the two clusterings, we have 
2d',^^{Ax,Ci \Ax) < 24„^(Cj, C/), which leads to 



adsumi^X, Ci \ Ax) < dsum(,Cj,Ci). 



(V) 
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Second, we prove forCnS the following claim: for any A<^ CiCiS, dsum{A, Cj n 5 \ ^4) < dsum{Cj n 
S, Ci n S)/2. On one hand, by summing Inequality |7] over all the subsets of Ci, we have 

a d,umiAx,Ci\Ax) < 2\^'^dsumiCj,Ci), 

OL 

'^dsum{Ci,Ci) < dsum{Cj,Ci). (8) 

The second inequality follows from the fact that for any p, q G Cj, in half of the choices of Ax, one of them 
is in Ax and the other is not. On the other hand, similar to the proof of Fact [51 we can show that with high 
probability, for any p € Q, dsum{p, Q n 5) < v^|^4«m(p, Ci). So we have 

151 

dsumiCi n S, n 5) = Yl dsum{p, anS)< ^rj^Aumip, Q) 

= ^]Y\ E d-^-rnia nS,q)< Y ^§ids^m{a, q) 

= 4/2^^dsum{Ci,Ci). (9) 

We can also show that with high probability, for any p £ Ci and any Cj{j / i), dsum{p,Cj H S) > 
~^]x\dsumip, Cj). So we have for any two different clusters Cj and Ci, 

1 l^l 

dsum (Cj ns,CinS) = Y ^sum {p,CinS)> Y ^JYl "^'"^ ^ 

= -^^\Ydsum{C,nS,q) > Y -^^\d,um{C„q) 

' ' dsumiCj,Ci). (10) 



So we have for any A C Q n 5, 

dsum{A,c^ns\A) < dsum{Cir\S,Cir\S)<</2- 

^ 1 2 dsum {Ci , Ci 



|X|2 

2^\X\ 



< r, 4/7r I v\2 dsum{Cj,Ci) < -dsum{Cj H 5, C; fl S) 



where the second inequality comes from Inequality |9j the third comes from Inequality [H and the last comes 
from Inequality [TO] 

We now use dsum{A, CinS\A) < dsum{Cj (1 S,Ci H S)/2 to prove the optimahty ofCnS. Suppose 
a pruning V* is obtained by splitting h clusters in C n 5 and at the same time joining some other clusters 
into g unions. Specifically, for 1 < i < /i, split Cj n 5 into rrii > 2 clusters Si^i, . . . , Si,mi', after that, 
merge Ch+i n S, . . . , Ch+ig n S into g unions, i.e. for I < j < g, Iq = 0, merge Ij — Ij^i > 2 clusters 
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Ch+ij^i+1 n 5, . . . , Ch+ij n S into a union Uj; the other clusters in C n 5 remain the same in V*. Since the 
number of clusters is still k, we have Y^- mi — h = Ig — g. The cost saved by splitting h clusters is 

^ ] ^ ^ dsumiSi^p, Si^q) = ^ ^ ^ ^ dgumiSi^p, Ci H S \ Si^p) . (H) 
l<j<h ^^P^Q^Trii l<i<h l<p<mi 

The cost increased by joining clusters is 

dsum{Cpns,CgnS). (12) 

l<i<S h+lj-i<pj^q<h+lj 

To prove C n 5 is the minimum cost pruning, we need to show that the saved cost ([TTI ) is less than the 
increased cost ([T2l ). Since each term in ([T2l ) is twice larger than any term in (fTTI ). it is sufficient to show that 
the number of the terms in (fT2l) is at least half the number of the terms in (fTTI) . i.e. 

l<j<g ^ ^ l<i<h 

We have 2^ . f^^^^-^ = Zjih " " ^j-i " 1) > 2Ej(^i " " 1) = 2(^9 - g), where the 

inequality comes from Ij — > 2. Since Ig — g = J2i "^i ~ ^' sufficient to show Ig — g > h. This 
comes from Ig — g = rrii — h = — 1) ^ 1 = /i since mj > 2. □ 



Theorem |7l Suppose the clustering instance {X, d) is a-perturbation resilient to the min-sum objective 
where a > S^^jj^q^i^pj. Then w.p. > 1—6, we can get an implicit optimum clustering in time 0{{^^ In 
where r] = mmp^x,i<i<k davgip, Ci) the minimum average distance between points and optimal clusters. 

Proof. As mentioned above, we sample a set S of size n = 0(7^ In ^) and run Algorithm [3] on S. We 

then output the implicit clustering of the whole space X that assigns each point p G X to Q G C such that 
dsum{p, Ci) is minimized. By Fact [5] and [6l the minimum cost pruning output by Algorithm [3] is C Ci S. 
Notice in the proof of Fact[5l we showed for any I < i ^ j < k,p £ Ci, 3 ^Tn -^c.tiS'I - 1 '^^um (p, Cj n 5) < 
dsum{p, Cj n S). Therefore, by inserting p in CidS such that dsum{p, Ci fl S) is minimized, we can obtain 
the optimum clustering. □ 



F Other Implementation Details 

El Dynamic Programming to Find the Minimum Cost A;-Cluster Pruning 

The idea of using dynamic programming to find the optimal /c-clustering in a tree of clusters is proposed 
in 121. We can find the optimal clustering by examining the entire tree of clusters produced. Denote the cost 
of the optimal m-clustering of a tree node p as cost{p, m). The optimal m-clustering of a tree node p is 
either the entire subtree as one cluster (m = 1), or the minimum over all choices of mi-clustering over its 
left subtree and m2-clustering over its right subtree (1 < m < k), where nii,m2 are positive integers such 
that mi + 1712 = m. Therefore, we can traverse the tree bottom up, recursively solving the m-clustering 
problem for 1 < m < A; for each tree node. The algorithm is presented in Algorithmic Since there are 0(n) 
nodes, and on each node p, computing cost{p, 1) takes O(n^) time, computing cost{p,m){l < m < k) 
takes 0{k'^) = 0{n'^), in total the algorithm takes time ©(n^). 

Notice when T is a multi-branch tree and not suitable for dynamic programming, we need to turn it 
into a 2-branch tree T' as follows. For each node with more than 2 children, for example, the node R with 
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children Ri,R2, . . . , Rt{t > 2), we first merge i?i and R2, then merge Ri U R2 with i?3, repeat until we 
merge Ri U R2 - ■ ■ U Rt-i with Rt into i?. In this way, we get a 2-branch tree T' and can run dynamic 
programming on it. Notice each pruning in T has a corresponding pruning in T', so the minimum cost 
pruning of T' has no greater cost than the minimum cost pruning of T. And the running time is still O(n^). 

Also notice when the cost function is center-based, such as A;-median, the algorithm essentially computes 
a center for the node p when computing cost{p, 1). So it can output the centers together with the pruning. 

Algorithm 4 Dynamic Programming in Tree of Clusters 

Input: tree of clusters T on data set S, distance function d{-, •) on S, k. 

• Traverse T bottom up, and denote the current tree node as p: 

• Calculate cost{p, 1). 

• IF p is a leaf, cost(p, m) = cost{p, 1)(1 < m < k), 

• ELSE cost{p, m) = va.m.m^j^m2=m cost{pi,mi) + cost{p2,m2) where pi,p2 are the children of p. 

• Traverse backwards to get the A;-clustering that achieves cost{r, k) where r is the root. 
Output: the /^-clustering. 



F.2 An 0{n^) Implementation of Algorithm [T] 

Here we show an 0{n^) implementation of Algorithm [T] namely Algorithm [5] 

Algorithm 5 Efficient Implementation of Algorithm [T] 
Input: Data set S, distance function d{-, ■) on S. 
Phase 1: Sort all the pairwise distances in ascending order. 

• For all p G S and 1 < i < n, compute L^, x{p, i)- Then compute x*{Pj i) by Equation [T3l 

• Let the current clustering be n singleton clusters. 

• For q) in ascending order: 

• Suppose q = . Check if d{p, q) satisfies the three claims in FactHl 

• where the third claim can be checked by verifying if x*(p> i) = — 1- 

• If so, merge all the clusters covered by B(p, d{p, q)). 

• Construct the tree T with points as leaves and internal nodes corresponding to the merges performed. 
Phase 2: Apply dynamic programming on T to get the minimum fc-median cost pruning C. 
Output: Clustering C. 

Notice at each merge step in Algorithm [H we only need to find the two clusters with the minimum 
closure distance. So we hope to compute the minimum closure distance without computing all the distances 
between any two current clusters. First we notice the following facts. 

Fact 7. In the execution of Algorithm\J} ifd is the minimum closure distance for the current clustering, then 

(1) there exist c,p £ S such that d = d{c, p); 

(2) d is no less than the minimum closure distances in previous clusterings. 

Proof. For the first claim, let c be the center of the ball in the definition of closure distance, and p be the 
farthest point from the center in the ball, then d = d{c,p). The second claim comes from the fact that the 
clusters in the current clustering are supersets of those in previous clusterings. □ 

Fact |7] implies that we can check in ascending order the pairwise distances no less than the minimum 
closure distance in the last clustering, and determine if the checked pairwise distance is the minimum closure 
distance in the cunent clustering. More specifically, suppose we have some black-box method for checking 
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if a pairwise distance is the minimum closure distance in the current clustering, we can perform the closure 
linkage as follows: sort the pairwise distances in a list in ascending order; start from the first distance in the 
list; check if the current distance is the minimum closure distance in the current clustering; if it is, merge 
clusters covered by the ball defined by the checked distance; continue to check the next distance in the list. 
So it is sufficient to design a method to determine if a pairwise distance is the minimum closure distance in 
the current clustering. Our method is based on the following facts. 

Fact 8. In Algorithm\l\ ifd(c,p) is the minimum closure distance for the current clustering, then 

(1) at least 2 clusters intersects B(c, d{c,p)); 

(2) all the clusters intersecting B(c, d{c,p)) are covered by M{c, d{c,p)); 

(3) for any p G M{c,d{c,p)),q M{c,d{c,p)),d{c,p) < d{p,q). 

Proof. The first claim and the third claim follow from the definition. We can prove the second claim by 
induction. This is trivial at the beginning. Suppose it is true up to any previous clustering, we prove it 
for the current clustering C . We need to show that for any C G C such that C" n M{c,d{c,p)) ^ 0, 
C C M{c,d{c,p)). If c G C, then by definition, C C M{c,d{c,p)). If C is a single point set {ci}, then 
trivially C C ]B(c, d{c,p)). What is left is the case when c ^ C and C is generated by merging clusters 
in a previous step. Suppose when C is formed, the closure distance between those clusters is defined by 
ci G C andpi. By induction, if c G IB(ci, d(ci,pi)), c would have been merged into C when C is merged, 
which is contradictory to c C. So we have c M{ci, d{ci,pi)), i.e. d{c,ci) > d{ci,pi). Then by the 
margin requirement of B(ci, d(ci,pi)), d{c, q) > d{ci,q) for any q £M{c, d{c,p)) n C. This further leads 
to ci G B(c, d{c,p)), since otherwise by the margin requirement of B(c, d{c,p)) and q G B(c, d{c,p)), we 
would have d{c,q) < d{ci,q). So for any point q' G C, since d{ci,q') < d{ci,pi) < d{c,ci), we have 
q' G B(c, d{c,p)) from the margin requirement, so C C B(c, d{c, q)). □ 

Notice if a pairwise distance satisfies the three claims, then it defines a closure distance for the clusters 
covered. So if we check the pairwise distances in ascending order, then the first one that satisfies the three 
claims must be the minimum closure distance in the current clustering. So we have a method to determine 
if a pairwise distance is the minimum closure distance. 

However, naively checking the third claim in Lemma [8] takes 0{n?), which is still not good enough. 
We can refine this step since intuitively, for every c, if d{c, q) comes after d{c, p) in the distance list, then 
when checking d{c,q), we can utilize the information obtained from checking d{c,p). Specifically, for 
every p G 5, define = {L\, . . . , L^) to be a sorted list of points in S, according to their distances to p in 
ascending order. Let X* (P) denote the index of the farthest point in W, which makes d{p, L^) fail the third 
claim in Lemma |8] Formally, define X* (P) to be the maximum j > i such that there exits s < i satisfying 
d{p, Ls) > d{Ls, L^). If no such point exists, let i) = —1- Then d{p, Lf) satisfies the third claim 
if and only ifx*{p,i) = —1, thus we turn the task of checking the claim into computing x*(P; «)■ In order to 
use the information obtained when previously checking d{p, L^_i), we compute x*(Pi i) from x*{Pj ^ ~ 1)- 
By the definition of x*, what is new of x*{Pj compared to x*{P: i — 1) is just the maximum j > i such 
that d{p, ) > d{Lf, Lp. Define xip, i) to be the maximum j > i such that d{p, L^) > d{L^, L^); if no 
such j exists, let x{p, = — 1- Then it is easy to verify that 

X*{p,i) = max{x*(p,i - l),x{p,i)}- (13) 

Notice it takes 0{n) time to compute xiP, 0' thus we can compute x*iP^ ^) for all p G 5, 1 < i < n in 
O(n^) time. The implementation is finally summarized in Algorithm^ which takes 0{n^) time. 
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